Search CORE

21 research outputs found

Optimal Completion of Incomplete Gene Trees in Polynomial Time Using OCTAL

Author: Christensen Sarah
Molloy Erin K.
Vachaspati Pranjal
Warnow Tandy
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 17th International Workshop on Algorithms in Bioinformatics (WABI 2017)
Publication date: 01/01/2017
Field of study

Here we introduce the Optimal Tree Completion Problem, a general optimization problem that involves completing an unrooted binary tree (i.e., adding missing leaves) so as to minimize its distance from a reference tree on a superset of the leaves. More formally, given a pair of unrooted binary trees (T,t) where T has leaf set S and t has leaf set R, a subset of S, we wish to add all the leaves from S R to t so as to produce a new tree t\u27 on leaf set S that has the minimum distance to T. We show that when the distance is defined by the Robinson-Foulds (RF) distance, an optimal solution can be found in polynomial time. We also present OCTAL, an algorithm that solves this RF Optimal Tree Completion Problem exactly in quadratic time. We report on a simulation study where we complete estimated gene trees using a reference tree that is based on a species tree estimated from a multi-locus dataset. OCTAL produces completed gene trees that are closer to the true gene trees than an existing heuristic approach, but the accuracy of the completed gene trees computed by OCTAL depends on how topologically similar the estimated species tree is to the true gene tree. Hence, under conditions with relatively low gene tree heterogeneity, OCTAL can be used to provide highly accurate completions of estimated gene trees. We close with a discussion of future research

Dagstuhl Research Online Publication Server

TRACTION: Fast Non-Parametric Improvement of Estimated Gene Trees

Author: Christensen Sarah
Molloy Erin K.
Vachaspati Pranjal
Warnow Tandy
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 19th International Workshop on Algorithms in Bioinformatics (WABI 2019)
Publication date: 01/01/2019
Field of study

Gene tree correction aims to improve the accuracy of a gene tree by using computational techniques along with a reference tree (and in some cases available sequence data). It is an active area of research when dealing with gene tree heterogeneity due to duplication and loss (GDL). Here, we study the problem of gene tree correction where gene tree heterogeneity is instead due to incomplete lineage sorting (ILS, a common problem in eukaryotic phylogenetics) and horizontal gene transfer (HGT, a common problem in bacterial phylogenetics). We introduce TRACTION, a simple polynomial time method that provably finds an optimal solution to the RF-Optimal Tree Refinement and Completion Problem, which seeks a refinement and completion of an input tree t with respect to a given binary tree T so as to minimize the Robinson-Foulds (RF) distance. We present the results of an extensive simulation study evaluating TRACTION within gene tree correction pipelines on 68,000 estimated gene trees, using estimated species trees as reference trees. We explore accuracy under conditions with varying levels of gene tree heterogeneity due to ILS and HGT. We show that TRACTION matches or improves the accuracy of well-established methods from the GDL literature under conditions with HGT and ILS, and ties for best under the ILS-only conditions. Furthermore, TRACTION ties for fastest on these datasets. TRACTION is available at https://github.com/pranjalv123/TRACTION-RF and the study datasets are available at https://doi.org/10.13012/B2IDB-1747658_V1

Dagstuhl Research Online Publication Server

ASTRID: Accurate Species TRees from Internode Distances

Author: A Criscuolo
BR Larget
D Bryant
DF Robinson
ED Jarvis
G Dasarathy
I Gronau
J Chifman
J Heled
J Sukumaran
JFC Kingman
JH Degnan
JP Gatesy
L Kubatko
L Liu
L Liu
L Liu
L Liu
L Liu
L Nakhleh
LL Knowles
MN Price
MS Bayzid
MS Bayzid
MS Bayzid
N Saitou
Pranjal Vachaspati
R Desper
S Mirarab
S Mirarab
S Mirarab
S Mirarab
S Mirarab
S Roch
S Roch
S Roch
S Song
S Song
T Warnow
Tandy Warnow
W Maddison
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Large scale phylogenomic estimation

Author: Vachaspati Pranjal
Publication venue
Publication date: 01/12/2019
Field of study

Phylogenomic estimation - the science of calculating evolutionary trees from genomic data - is an important biological problem. As the amount of genomic data in biological datasets increases, new methods are needed to analyze this data. Cutting edge analyses may utilize genomes from tens of thousands of species. I present several methods for supertree and species tree estimation: ASTRID, FastRFS, SVDquest, and SIESTA. ASTRID can be used for both species tree and supertree estimation, and is designed to scale to very large datasets while maintaining a high level of accuracy. FastRFS is a supertree method that uses an exact constrained optimization algorithm to find accurate supertrees. SVDquest is a coalescent-aware species tree estimation method that estimates trees directly from sequences without using gene trees. Finally, SIESTA is a modification to the algorithms used by FastRFS, SVDquest, and other methods including ASTRAL that allows for the output and analysis of multiple optimal solutions, if they exist. For all these methods, I describe the algorithms used, along with a theoretical analysis of their running time and their statistical consistency. I also show results on biological and simulated data that demonstrate these methods’ effectiveness over a wide range of model conditions. In addition, I present the results of an experiment that compares various methods on trees simulated under both incomplete lineage sorting (ILS) as well as horizontal gene transfer (HGT)

Illinois Digital Environment for Access to Learning and Scholarship Repository

Optimizing tensor contractions for nuclear correlation functions

Author: Vachaspati Pranjal
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2014
Field of study

Thesis: S.B., Massachusetts Institute of Technology, Department of Physics, 2014.Cataloged from PDF version of thesis.Includes bibliographical references (pages 37-38).Nuclear correlation functions reveal interesting physical properties of atomic nuclei, including ground state energies and scattering potentials. However, calculating their values is computationally intensive due to the fact that the number of terms from quantum chromodynamics in a nuclear wave function scales exponentially with atomic number. In this thesis, we demonstrate two methods for speeding up this computation. First, we represent a correlation function as a sum of the determinants of many small matrices, and exploit similarities between the matrices to speed up the calculations of the determinants. We also investigate representing a correlation function as a sum of functions of bipartite graphs, and use isomorph-free exhaustive generation techniques to find a minimal set of graphs that represents the computation.by Pranjal Vachaspati.S.B

DSpace@MIT

Large scale phylogenomic estimation

Author: Vachaspati Pranjal
Publication venue
Publication date
Field of study

Simulated Data

Author: Pranjal Vachaspati (756609)
Publication venue
Publication date
Field of study

Sequences, estimated gene trees, estimated species trees, and true species trees for SIESTA simulated dat

FigShare

SVDquest failed datasets

Author: Pranjal Vachaspati (756609)
Publication venue
Publication date
Field of study

This fileset includes sequences that failed when run under SVDquartets+PAUP* with the message "No informative quartets were found in SVDQuartets analysis."<div><br></div><div>This data is based on data originally generated for "ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes." (Mirarab and Warnow 2015).</div

FigShare

Fast evaluation of multi-hadron correlation functions

Author: Detmold William
Vachaspati Pranjal
Publication venue: 'Sissa Medialab'
Publication date: 18/03/2019
Field of study

Calculating the values of nuclear correlation functions is computationally intensive due to the fact that the number of terms in a nuclear wave function scales exponentially with atomic number. To speed up this computation, we represent a correlation function as a sum of the determinants of many small matrices, and exploit similarities between the matrices to speed up the calculations of those determinants

DSpace@MIT